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Abstract 

The boundary layer of a finite domain [a, b] covers mesoscopic lat- 
eral neighbourhoods, inside [a, b], of the endpoints a and b. The correct 
diagnostic of the integrand behaviour at a and b, based on its sam- 
pling inside the boundary layer, is the first from a set of hierarchically 
ordered criteria allowing a priori Bayesian inference on efficient mesh 
generation in automatic adaptive quadrature. 

1 Introduction 

The boundary layer problem in numerical quadrature is the first, and prob- 
ably the most difficult to solve, from a Bayesian chain aiming either to im- 
plement the automatic mesh generation by the adaptive quadrature methods 
under sound expectation of reliable local quadrature rule {q, e} outputs, or 
to provide early detection of the origins of code failure. 

The need of such a stringent requirement, which goes well beyond the 
usual implementation of the automatic adaptive quadrature rules (see, e.g., 
[Ij ^ El) arises in the numerical exploration of the predictions of models 
describing phase transitions in complex physical systems, which critically 
depend on the realization of (unknown in advance) values of some specific 
parameters (like, e.g., the hole or electron doping level in high-Tc supercon- 
ducting materials The solution of the resulting parametric integrals 
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makes use of the existing library programs the rehabihty of the outputs of 
which is heavily based on user's ability to choose the suitable procedure from 
a proposed menu. The impossibility to know in advance the detailed be- 
haviour of the integrand function over the whole class of parametric integrals 
forces the use of a trial and error approach which may result in unnoticed 
unreliable {q, e} pairs and, consequently, bad output failure. 

The a posteriori assessment of the reliability of the {q, e} pair over the 
current integration subrange IH] solves only half of the problem since the 
code remains highly inefficient. The a priori verification of the conditioning 
of the integrand profile at the quadrature knots, which was proposed by us 
some time ago [7j needed in fact, a whole set of hierarchically ordered criteria 
providing Bayesian inference on the status of the gradually generated 
integrand profile at newly added quadrature knots. 

The root of the resulting Bayesian inference decision tree is the diagnostic 
of the behaviour of the integrand function f{x) at the boundaries a and b of 
the finite integration domain [a,b]. To set it, a suitable integrand sampling 
is required inside a mesoscopic neighbourhood of the boundary layer of [a,b]. 

The present paper generalizes the analysis done in over minimal four 
point partitions inside the mesoscopic regions associated to each of the ends a 
and b of [a, b]. By allowing an unrestricted four point partition, the diagnostic 
failures stemming from the inadequacy of a frozen integrand sampling are 
avoided, resulting in analysis reliability enhancement for difficult integrand 
functions. 

2 Diagnostics and Bayesian inferences from 
the boundary layer analysis 

Let Xr denote either fl{a) or fl{b), the fioating point representations of the 
endpoints a and b of [a, b] . The analysis establishes the status of the value 
fxr = fKfi^r)), the floating point representation of the computed value of 
the integrand, f{xr) G M, as follows: 

(i) Diagnostic: Smooth integrand behaviour. 

Bayesian inference: Regular quadrature knot mesh over [a, b] can be 
generated starting from the Xr endpoint. 

Supplementary information: The analysis also generates an estimate of 
the lateral derivative f'{xr) inside [a, h\. This will serve to the formula- 
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tion of acceptance check criteria for the two quadrature knots which he 
nearest and next nearest to ,x,.. In case of rejection, the decision to per- 
form the immediate subrange subdivision of [a,b], without generating 
the integrand profile at the remaining quadrature knots is taken. 

(ii) Diagnostic: Endpoint or outer singularity of f{x) or its derivatives. 
Bayesian inference: Slow convergence is to be expected under subrange 
subdivision. Use of a specific subrange subdivision procedure based on 
bisection together with a convergence acceleration procedure (e.g., the 
epsilon extrapolation algorithm) are a must. 

(iii) Diagnostic: Inner nearby singularity. 

Bayesian inference: The occurrence of an offending inner singular point 
Xo near Xf is to be further confirmed. Under affirmative diagnostic, Xq 
is to be located to machine accuracy. The input integral is then split 
into two integrals, over [a,xo) and {xo,b], each being further processed 
following the procedure adequate for endpoint singularity. 

(iv) Diagnostic: Inner nearby finite jump. 

Bayesian inference: Further confirmation of the existence of an of- 
fending inner jump point Xq near Xr is necessary. Under affirmative 
diagnostic, Xq is to be located to machine accuracy. The input integral 
is then split into two integrals, over [a, Xq) and {xq , b], with /(^c^J") and 
/'(xq") taking values equal to the lateral limits of f{x) and f'{x) at xq. 
The resulting integrals over the two subintervals are to be solved for 
smooth f{x). 

(v) Diagnostic: Irregular behaviour. 

Bayesian inference: The output of the automatic procedure could 
hardly be taken for reliable. Clarification of the offending integrand 
behaviour is a must. 

(vi) Diagnostic: Smooth integrand behaviour at both ends a and b. 

Bayesian inference: Early check for the presence of an oscillatory or 
odd integrand is useful. If an affirmative diagnostic is issued, then 
define a ceiling accuracy of the expected output, originating in severe 
precision loss due to heavy cancellation by subtraction. 
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3 Unrestricted least squares analysis 



Theorem 1 The function f :[a,b] C M — M, / = f{x), is smooth inside a 
lateral mesoscopic neighbourhood V{xr) C [a,b] of the reference abscissa Xr 
denoting the floating point representation of either the end a or the end b of 
[a,b], provided the computed values of the first order divided differences of 
f{x) over any abscissa sampling inside V{xr) are independent on the choice 
of the samphng abscissas. 

Remark 1 The result stated in Theorem^ essentially follows from the prop- 
erty that, for any reference abscissa Xr & D G W of a continuous twice dif- 
ferentiable function f : D M, / = f{x), a nonvanishing neighbourhood 
V{xr) C D does exist inside which the hnear Taylor series expansion of f{x) 
around f (xr ) holds true within some predefined accuracy threshold < £ <^ 1. 

The numerical check of the continuity of f{x) at the ends of the integration 
domain [a, b] can only be done from a sampling of its computed values, {fi = 
fl{f{xi))\i = 0, 1, ■ ■ ■ , m}, over a set of m + 1 machine number arguments 

Sm{Xr) = {Xi e V{Xr)\i = 0, 1, ■ ■ • ,m}, Xr G Sm{Xr), 771 > 3. If {f{Xi))\i = 

0, 1, • • • ,m} denote the actual values of f{xi) over Sm{xr), then, due to the 
round-off, f{xi) — fi ^ in general. As a consequence, the best information 
on the smoothness properties of f{x) at Xr following from the set {xi, fi} is 
obtained from the scrutiny of the properties of a second degree polynomial 
least squares fit to the floating point data. 

A problem in terms of machine number abscissas is obtained by the scale 
transformation 

Xi = Xo + ^ihr, 2 = 0, l,---,m; G Z; = 0, (1) 

where hr denotes the algebraic distance from Xr to its nearest machine num- 
ber inside [a, b]. This leads to the second degree fitting polynomial 

y2{xi) = aoTToi^i) + aiTTi{^i) + a2vr2(^i) , (2) 

spanned by the orthonormal basis polynomials Hki^i), /c = 0, 1, 2. 

A linear polynomial fitting over Sm{xr) is obtained provided a2TT2{0 is 
negligible everywhere at ^ G [^min.imax], imin = min{^o,6, ■ ■ ■ ,^m}, imax = 
max{^0)'Ci) ■ ■ ■ )'Cm}- Making all calculations requested by the least squares 
procedure then results in the statement of the theorem, QED. 
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Corollary 1 // we assume a minimal mesh sampling 5*3 (x^) characterized 
by the abscissa set = 0, = p > 0, ^2 = I^V > iii ^3 = <?; I^sl ^ ii; then 
the following smoothing criteria emerge: 



d2o - iidiQ ^ 0, 



p 



0, 



diQ - d. 
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0, dij = fi- fj. (3) 



Corollary 2 // we assume a minimal mesh sampling 5*3 (x^) characterized 
by the abscissa set = 0, = P ~ G = 9, = ^^^{Ci, ^2} ^ 1^3; then 
the following smoothing criteria emerge: 

pd2o - qdiQ ^ 0, qdso - rd2o ~ 0, rdio - pdso ~ 0. (4) 



Corollary 3 If the analysis issued the diagnostic of smooth f{x) at the end- 
point Xr, then the following estimate for the lateral first order derivative 
f\xr) inside [a, h] holds, 

f'(xr)^ = — Sidio, (5) 



m ^ m 



m + 1 ^-^ m + 1 ^-^ 

i=0 i=0 



4 Diagnostic uniqueness 

To allow useful inferences, the analysis done in the previous section has to 
be supplemented, on one side, with a quantitative measure of the smallness 
of the differences defined in Corollaries ^ and El and, on the other side, with 
qualitative criteria able to single out those specific unique features which 
characterize other integrand behaviours inside a mesoscopic neighbourhood 

of Xy. 

4.1. Smoothness threshold. The practical implementation of the 
smoothness criteria and Q compares the magnitudes of the expressions 
entering the left hand sides of these equations with an integrand dependent 
upper threshold, 

0<Tf = ro£omax{|/o|, I/2I, I/3I}, (7) 
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where e^, the machine epsilon with respect to addition, defines the machine 
accuracy dependent parameter of the solution, while Tq, < tq <ti , is a 
heuristic parameter intended to overcome possible diagnostic errors coming 
from the normal roundoff noise. In practice, we have found that a value 
To = 2^° resulted in adequate diagnostics for all the tested smooth case study 
functions /(x). 

The diagnostic reliability decreases as long as the magnitudes of one or 
more of the quantities entering the Eqs. Q or (0} get near Tf. 

4.2. Endpoint or outer singularity. The basic features of such a 
behaviour are: inward monotonically decreasing under di^i-i-di+i^i > 0; 
steep variation of f{x) over argument distances separated by one or a few 
machine numbers; inward monotonically decreasing 

If the sampling of Corollary ^ is generated, then the addition of two 
supplementary abscissas at + ^3)/2 and (^i + ^2)/2 respectively and the 
confirmation of the abovementioned three features once again over the result- 
ing smaller subranges distinguishes singular behaviour from an inner finite 
jump near Xr- 

4.3. Inner nearby singularity. The features of this kind of behaviour 
are mirror refiected with respect to those of an endpoint or outer singularity. 

4.4. Inner nearby finite jump. There are two hints suggesting this 
diagnostic. First, the sharp increase of the magnitude of one of the first 
order divided differences approximating f'{x) over the subranges defined by 
sampling. Second, the feature gets enhanced if a finer partition is defined 
over the subrange in question. 

4.5. Irregular behaviour. This negative diagnostic is usually associ- 
ated with the occurrence of rapid oscillations of f{x) which make the usual 
local quadrature rules ineffective. 

5 Code robustness: hardware and software 
environment 

The analysis described above yields diagnostics issued by a code which runs 
within an environment defined by the hardware and the software at hand. 
The code robustness, reliability, and portability are secured provided several 
delicate points are adequately solved. 

The last code versions were run on several PCs with Intel 4+, AMD32, 
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or AMD64 processors, Linux 2.4 or Linux 2.6+ operating systems, and the 
GNU gcc compiler incorporating Fortran 77. Early code versions were also 
run on SUN workstations in BLTP-JINR, or under Microsoft XP OS. 

The study of the conformity of the hardware and software of the above- 
mentioned systems to the IEEE 754 standard which governs the floating 
point arithmetic (see, e.g., [IHIIIII), revealed the occurrence of four instances 
where the requirements of the standard were more or less frequently infringed: 
(i) length of the significand; (ii) floating point comparisons; (iii) code opti- 
mization; (iv) underflow threshold. 

5.1. Length of the floating point double precision significand. 
Under the standard compliant value of the machine epsilon with respect 
to addition, Eq = 2~^^, often analysis failure has been noticed. This was 
identified to stem from the loss of the last, assumed significant, 53-rd hit 
under reuse of exact data after their RAM/cache storage. 

The simplest possible case study of hidden bit loss is illustrated in Fig. la: 
the quantities = 1 + keQ, Sq = 2~^^, k = —20(1)20, are computed and 
stored in an array (to force RAM/cache storage) and then the differences 
Sk+i,k = flfc+i ~ o-k are plotted. Under IEEE 754 standard observance, the 
constant answer, 6k+i,k = £o should had been obtained. However, this result 
is obtained indeed under Eq = 2~^^. 

Fig. lb illustrates the same effect when exploring integrand behaviour 
around a singular point on an example taken from QUADPACK, P, p. 110, 



\x^ + 2x- 2|i/2 ^|(a;-a;,)(x-a;^)|' 
where Xp = — 1, Xm = —Vs — 1. (8) 

The values {f{xk); Xk = fl{xp){l + keo); Eq = 2-^^\ k = -20(1)20} have 
been computed and compared with the exact ones. Several spuriously equal 
output pair are noticed at pairs of assumed neighbouring machine number 
arguments, which spoil the reliability of the analysis. 

Increase of Eq to = 2"^^ rules out such spurious pairs (Fig. Ic). 

5.2. Floating point comparisons. In the example (jH)), fl{xp) may be 
either transferred from a procedure to another one using the fh2{xp) value 
which retains the most significant 52 binary bits in the significand, or it may 
be directly computed in CPU from the original expression Xp = ^/S — 1, in 
which case the processor stored fhi^Xp) value retains the most significant 64 
binary bits. The IEEE 754 standard asks that flQi{xp) . EQ . 7/52 (Xp) = . TRUE . 
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None of the two compilers available to us (f77, C++ gcc) did obey to this 
requirement. 

Since the use of floating point comparison was unavoidable, special care 
was taken to exclude all the possible /Z52 to fl^ comparisons. 

5.3. Code optimization by the compiler. We might try to write a 
code in which use of /Z52 values is always secured in comparison operations by 
asking the transfer of each variable entering such operation to RAM/cache, 
followed by their transfer back to CPU. In its search for "efficiency" increase, 
the code optimization by the compiler finds such a trick "unnecessary", thus 
spoiling the output correctness. The observation is not singular jT2|. 

5.4. The underflow threshold. On all but one computers at our 
disposal, the standard value u = 2"^°^^ was found to be right. However, 
on one of the mentioned SUN workstations the code crashed aX u = 2~^°^^, 
while the value u = 2~^'^^^ was found to be OK. 

5.5. Catastrophic precision loss in the neighbourhood of a non- 
zero singular point. Two supplementary features present in figures lb 
and Ic deserve consideration: (i) The computed function values are different 
from the exact ones starting with the most significant bit. (ii) The use of the 
primary QUADPACK expression hsted in Eq. (jH)) (data labelled "153", "152") 
infringes the left-right symmetry of the exact data. 

In spite of this severe precision loss due to cancellation by subtraction, 
correct inferences based on the use of the unique features characterizing a 
singular behaviour (paragraph 4.2) are still possible due to the fact that the 
computed "wrong data" preserve the ordering relationships characteristic to 
the "true data". 

If a substitution of variable which moves the singularity to the origin is 
possible, then all the difficulties enumerated at paragraphs 5.2, 5.3, and 5.5 
are completely removed (Fig. Id). 

6 Conclusions 

The boundary layer problem asks for accurate and reproducible diagnostics 
of integrand f{x) behaviour at the endpoints a and 6 of a finite integration 
domain [a,b]. 

In the present paper we have discussed several critical issues which dra- 
matically infiuence code robustness and reliability. Generalizations of the re- 
sults previously reported in (9j allows significant improvement of code quality 
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by: 

(i) Definition of smootli bcliaviour from an unrestricted least squares 
analysis over small (mesoscopic) neighbourhoods of the endpoints a and b. 
This secures the derivation of continuity criteria valid everywhere over the 
mesoscopic range where the analysis is done. 

(ii) Formulation of qualitative diagnostic criteria which reliably single 
out the various kinds of integrand behaviour even under severely damaged 
accuracy of the computed data. 

(iii) Derivation of accurate tests for reliable definition of the machine 
epsilon with respect to the addition. 

(iv) Identification of the critical hardware and software features which 
could spoil the correctness of the diagnostics by deviation from the IEEE 
754 standard and code reformulation such as to become insensitive to such 
drawback of the computing environment. 

The correct solution of the boundary layer problem is the first from a 
set of hierarchically ordered problems the solutions of which should allow a 
priori Bayesian inferences on efficient and reliable mesh generation within 
automatic adaptive quadrature. The solution of such problems is planned to 
be discussed in subsequent reports. 
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Fig. 1: Influence of machine epsilon definition on output reliabiUty. (a) Check of 
the fulfilment of the reliability criterion + (/c + l)eo) — /^l + ^^o) = £o- (Data 
set "o53" shows that our system is not compliant to the IEEE 754 standard value 
eo = 2~^^; data set "o52" shows that £o = 2~^^ is the right choice.) (b)-(d) Com- 
parison of two code outputs with the true results for the case study function (8): 
(b) - spuriously equal function pair values occur under eo = 2~^^ at arguments 
intended to represent neighbouring machine numbers; (c) - the spuriously equal 
pairs disappear, but heavy precision loss and output precision path dependence 
are still present near the singular point = \/3 — 1 under eq = 2"^^ ; (d) - all 
difficulties are solved if singularity is moved to the origin, where Eq = u = 2"^''^^. 
(Legend prefixes: "t" - true results; "c" - unique fioating point value of Xp used 
inside all procedures; "1" ~ calculation of f{x) is done using a locally processor 
produced approximation for Xp, which is more accurate than that transferred inbe- 
tween other procedures.) 
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